AllLife Bank is a US bank that has a growing customer base. The majority of these customers are liability customers (depositors) with varying sizes of deposits. The number of customers who are also borrowers (asset customers) is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business and in the process, earn more through the interest on loans. In particular, the management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors).
A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio.
You as a Data Scientist at AllLife Bank have to build a model that will help the marketing department to identify the potential customers who have a higher probability of purchasing the loan.
To predict whether a liability customer will buy personal loans, to understand which customer attributes are most significant in driving purchases, and to identify which segment of customers to target more.
ID: Customer ID
Age: Customer’s age in completed years
Experience: # years of professional experience
Income: Annual income of the customer (in thousand dollars)
ZIP Code: Home Address ZIP code.
Family: The family size of the customer
CCAvg: Average spending on credit cards per month (in thousand dollars)
Education: Education Level. 1: Undergrad; 2: Graduate;3: Advanced/Professional
Mortgage: Value of house mortgage if any. (in thousand dollars)
Personal_Loan: Did this customer accept the personal loan offered in the last campaign?
Securities_Account: Does the customer have a securities account with the bank?
CD_Account: Does the customer have a certificate of deposit (CD) account with the bank?
Online: Do customers use Internet banking facilities?
CreditCard: Does the customer use a credit card issued by any other Bank (excluding All Life Bank)?
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# to split data into training and test sets
from sklearn.model_selection import train_test_split
# to build decision tree model
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# to compute classification metrics
from sklearn.metrics import (
confusion_matrix,
accuracy_score,
recall_score,
precision_score,
f1_score,
)
import warnings
warnings.filterwarnings("ignore")
from google.colab import drive
drive.mount('/content/drive')
Loan_Model_df = pd.read_csv("/content/drive/MyDrive/AI-ML/M02/Loan_Modelling.csv")
Loan_Model_df.head(5)
Loan_Model_df.sample(5)
Loan_Model_df.shape
print("Shape of the Data Set (rows X columns) : ",Loan_Model_df.shape)
print("Total Number of Data (Quantity Of Data) : ",Loan_Model_df.shape[0])
print("Total Number of Faetures (Number Of Columns) : ",Loan_Model_df.shape[1])
Loan_Model_df.info()
Loan_Model_df.describe(include="all").T
Loan_Model_df.isnull().sum()
Working_Loan_Model_df = Loan_Model_df.copy()
Working_Loan_Model_df
Working_Loan_Model_df.describe(include="all").T
Loan_Model_features = Working_Loan_Model_df.select_dtypes(include=["int64","float64"])
Loan_Model_features
ID: Unique customer identifier, no analytical significance.
Age: Customers are mostly middle-aged, averaging around 45 years.
Experience: Closely follows age, averaging 20 years; negative values indicate data errors.
Income: Average annual income is about $74K, with wide variation among customers.
ZIPCode: Represents customer location, not directly useful for modeling.
Family: Most customers have small families of 2–3 members.
CCAvg: Average monthly credit card spending is $1.9K, indicating varied spending behavior.
Education: Majority are graduates or professionals; higher education links to loan acceptance.
Mortgage: About half of customers have no mortgage; others vary widely up to $635K.
Personal_Loan: Only ~9.6% of customers accepted a personal loan, showing class imbalance.
Securities_Account: Around 10% hold a securities account with the bank.
CD_Account: Only 6% have a CD account; these customers show higher loan interest.
Online: Nearly 60% use online banking — potential for digital marketing.
CreditCard: 29% have a credit card with another bank, useful for cross-selling.
plt.figure(figsize=(15, 10))
features_for_univarent_analysis = Working_Loan_Model_df.columns.tolist()
print(features_for_univarent_analysis)
for i , feature in enumerate(features_for_univarent_analysis):
plt.subplot(5, 3, i+1)
sns.histplot(data=Working_Loan_Model_df, x=feature)
plt.tight_layout();
plt.figure(figsize=(15, 10))
features_for_univarent_analysis = Working_Loan_Model_df.columns.tolist()
print(features_for_univarent_analysis)
for i , feature in enumerate(features_for_univarent_analysis):
plt.subplot(5, 3, i+1)
sns.boxplot(data=Working_Loan_Model_df, x=feature)
plt.tight_layout();
print(100*Working_Loan_Model_df['Personal_Loan'].value_counts(normalize=True), '\n')
# plotting the count plot for Personal_Laon
sns.countplot(data=Working_Loan_Model_df, x='Personal_Loan');
The dataset shows that 90.4% of customers did not take a personal loan, while only 9.6% opted.
print(100*Working_Loan_Model_df['Online'].value_counts(normalize=True), '\n')
# plotting the count plot for Online
sns.countplot(data=Working_Loan_Model_df, x='Online');
About 59.7% of customers use internet banking, while 40.3% do not.
print(100*Working_Loan_Model_df['CreditCard'].value_counts(normalize=True), '\n')
# plotting the count plot for CreditCard
sns.countplot(data=Working_Loan_Model_df, x='CreditCard');
Approximately 29.4% of customers have a credit card from another bank, while 70.6% do not.
# Scatter plot matrix
plt.figure(figsize=(16, 12))
sns.pairplot(Working_Loan_Model_df, vars=features_for_bivarent_analysis, hue='Personal_Loan', diag_kind='kde');
# defining the size of the plot
plt.figure(figsize=(12, 7))
features_for_bivarent_analysis = ['Age', 'Experience', 'Income', 'Family', 'CCAvg', 'Education', 'Mortgage', 'Personal_Loan', 'Securities_Account', 'CD_Account', 'Online', 'CreditCard']
# plotting the heatmap for correlation
sns.heatmap(
Working_Loan_Model_df[features_for_bivarent_analysis].corr(),annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="YlGnBu"
);
Strongest correlations:
Income : 0.50
CCAvg (average credit card spending) : 0.37
CD_Account : 0.32
These are positively correlated with Personal_Loan, meaning customers with higher income, higher average credit card spending, or investment accounts are more likely to take a personal loan.
Weak correlations:
Age, Experience, Family, Education, Mortgage : correlations close to 0 These features have very little influence on whether a customer takes a personal loan.
features (Online, CreditCard, Securities_Account) show very low correlation ( 0.02 - 0.00 ) with Personal_Loan, so they are not strong predictors individually.
Overall Insight:
Financial strength indicators (Income, CCAvg, CD_ccounts) are the key drivers for Personal_Loan uptake.
Demographics and usage of banking services have minimal impact on predicting Personal_Loan.
# Income vs Approved (boxplot)
plt.figure(figsize=(10, 6))
sns.boxplot(data=Working_Loan_Model_df, x='Personal_Loan', y='Income')
plt.title('Income vs Personal_Loan (Boxplot)')
plt.show()
Income is expected to be a key driver, as customers with higher annual incomes generally possess greater financial capacity, making them more likely to accept a personal loan offer.
# Income vs Approved (boxplot)
plt.figure(figsize=(10, 6))
sns.boxplot(data=Working_Loan_Model_df, x='Personal_Loan', y='Family')
plt.title('Family vs Personal_Loan (Boxplot)')
plt.show()
While very large or small family sizes might influence financial stability, customers with a median family size of 3 are often in a phase of life that increases financial commitments, making them more receptive to personal loan offers.
# Income vs Approved (boxplot)
plt.figure(figsize=(10, 6))
sns.boxplot(data=Working_Loan_Model_df, x='Personal_Loan', y='CCAvg')
plt.title('CCAvg vs Personal_Loan (Boxplot)')
plt.show()
Customers with a higher average monthly credit card spending (CCAvg) may be more likely to purchase a personal loan, as this spending indicates a greater financial need or a comfort level with leveraging credit that makes a loan appealing.
# Income vs Approved (boxplot)
plt.figure(figsize=(10, 6))
sns.boxplot(data=Working_Loan_Model_df, x='Personal_Loan', y='Education')
plt.title('Education vs Personal_Loan (Boxplot)')
plt.show()
Higher education correlates positively with loan acceptance.
Graduate or Professional education levels correlate with higher acceptance.
# Income vs Approved (boxplot)
plt.figure(figsize=(10, 6))
sns.boxplot(data=Working_Loan_Model_df, x='Personal_Loan', y='CD_Account')
plt.title('CD_Account vs Personal_Loan (Boxplot)')
plt.show()
Despite the general expectation that CD_Account holders prioritize savings, the moderate positive correlation (0.32) observed in the data suggests the opposite. Customers with a Certificate of Deposit (CD_Account) are more likely to accept a personal loan.
# Income vs Approved (boxplot)
plt.figure(figsize=(10, 6))
sns.boxplot(data=Working_Loan_Model_df, x='Personal_Loan', y='CreditCard')
plt.title('CreditCard vs Personal_Loan (Boxplot)')
plt.show()
he feature CreditCard (use of a credit card from another bank) is shown to have no impact on the likelihood of accepting a personal loan, as indicated by a correlation of 0.0. This suggests that owning a non-AllLife Bank credit card is neither a predictor of financial need nor financial aversion for this specific loan product.
Working_Loan_Model_df.loc[Working_Loan_Model_df["Experience"]<0, "Experience"] = np.nan
Working_Loan_Model_df["Experience"].fillna(Working_Loan_Model_df["Experience"].median(), inplace=True)
for c in ["ID","ZIPCode"]:
if c in Working_Loan_Model_df.columns: Working_Loan_Model_df.drop(c, axis=1, inplace=True)
During feature cleaning, negative values in the Experience column were treated as missing () and subsequently imputed using the median of the column, while the ID and ZIP Code columns were dropped as they are unique identifiers with no predictive power.
# defining the explanatory (independent) and response (dependent) variables
X = Working_Loan_Model_df.drop(["Personal_Loan"], axis=1)
y = Working_Loan_Model_df["Personal_Loan"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, stratify=y, random_state=42)
print("Shape of training set:", X_train.shape)
print("Shape of test set:", X_test.shape, '\n')
print("Percentage of classes in training set:")
print(100*y_train.value_counts(normalize=True), '\n')
print("Percentage of classes in test set:")
print(100*y_test.value_counts(normalize=True))
After splitting the dataset into training and test sets, the resulting sample sizes were 4,000 and 1,000 rows, respectively, with both sets demonstrating excellent stratification by maintaining the original class distribution of approximately 90.4% non-loan customers (0) and 9.6% personal loan customers (1).
dtree1 = DecisionTreeClassifier(random_state=42) # random_state sets a seed value and enables reproducibility
# fitting the model to the training data
dtree1.fit(X_train, y_train)
We define a utility function to collate all the metrics into a single data frame, and another to plot the confusion matrix.
def model_performance_classification(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
"""
# predicting using the independent variables
pred = model.predict(predictors)
acc = accuracy_score(target, pred)
recall = recall_score(target, pred)
precision = precision_score(target, pred)
f1 = f1_score(target, pred)
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
def plot_confusion_matrix(model, predictors, target):
"""
To plot the confusion_matrix with percentages
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plot_confusion_matrix(dtree1, X_train, y_train)
dtree1_train_perf = model_performance_classification(
dtree1, X_train, y_train
)
dtree1_train_perf
plot_confusion_matrix(dtree1, X_test, y_test)
dtree1_test_perf = model_performance_classification(
dtree1, X_test, y_test
)
dtree1_test_perf
The initial Decision Tree model achieved perfect metrics (Accuracy, Recall, Precision, F1-score of 1.0) on the training data, but the noticeable drop in performance on the test set (e.g., Precision of 0.875 and F1-score of 0.91) indicates significant overfitting and necessitates model pruning for better generalization.
feature_names = list(X_train.columns)
plt.figure(figsize=(20, 20))
out = tree.plot_tree(
dtree1,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black") # set arrow color to black
arrow.set_linewidth(1) # set arrow linewidth to 1
# displaying the plot
plt.show()
print(
tree.export_text(
dtree1, # specify the model
feature_names=feature_names, # specify the feature names
show_weights=True # specify whether or not to show the weights associated with the model
)
)
# define the parameters of the tree to iterate over
max_depth_values = np.arange(2, 11, 2)
max_leaf_nodes_values = np.arange(10, 51, 10)
min_samples_split_values = np.arange(10, 51, 10)
# initialize variables to store the best model and its performance
best_estimator = None
best_score_diff = float('inf')
# iterate over all combinations of the specified parameter values
for max_depth in max_depth_values:
for max_leaf_nodes in max_leaf_nodes_values:
for min_samples_split in min_samples_split_values:
estimator = DecisionTreeClassifier(
max_depth=max_depth,
max_leaf_nodes=max_leaf_nodes,
min_samples_split=min_samples_split,
random_state=42
)
estimator.fit(X_train, y_train)
y_train_pred = estimator.predict(X_train)
y_test_pred = estimator.predict(X_test)
train_f1_score = f1_score(y_train, y_train_pred)
test_f1_score = f1_score(y_test, y_test_pred)
score_diff = abs(train_f1_score - test_f1_score)
if score_diff < best_score_diff:
best_score_diff = score_diff
best_estimator = estimator
# creating an instance of the best model
dtree2 = best_estimator
dtree2.fit(X_train, y_train)
plot_confusion_matrix(dtree2, X_train, y_train)
dtree2_train_perf = model_performance_classification(
dtree2, X_train, y_train
)
dtree2_train_perf
plot_confusion_matrix(dtree2, X_test, y_test)
dtree2_test_perf = model_performance_classification(
dtree2, X_test, y_test
)
dtree2_test_perf
After pre-pruning the Decision Tree (dtree2), the performance gap between the training and test sets has significantly closed, resulting in a more generalized model that achieves a high Accuracy of 0.986 and a balanced F1-score of 0.929 on the unseen test data, indicating that the overfitting issue has been successfully addressed.
# list of feature names in X_train
feature_names = list(X_train.columns)
# set the figure size for the plot
plt.figure(figsize=(20, 20))
out = tree.plot_tree(
dtree2,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
# displaying the plot
plt.show()
# printing a text report showing the rules of a decision tree
print(
tree.export_text(
dtree2,
feature_names=feature_names,
show_weights=True
)
)
# Create an instance of the decision tree model
clf = DecisionTreeClassifier(random_state=42)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = abs(path.ccp_alphas)
impurities = path.impurities
pd.DataFrame(path)
# Create a figure
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("Effective Alpha")
ax.set_ylabel("Total impurity of leaves")
ax.set_title("Total Impurity vs Effective Alpha for training set");
# Initialize an empty list to store the decision tree classifiers
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(ccp_alpha=ccp_alpha, random_state=42)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is {} with ccp_alpha {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
# Remove the last classifier and corresponding ccp_alpha value from the lists
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("Alpha")
ax[0].set_ylabel("Number of nodes")
ax[0].set_title("Number of nodes vs Alpha")
# Plot the depth of tree versus ccp_alphas on the second subplot
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("Alpha")
ax[1].set_ylabel("Depth of tree")
ax[1].set_title("Depth vs Alpha")
# Adjust the layout of the subplots to avoid overlap
fig.tight_layout()
train_f1_scores = [] # Initialize an empty list to store F1 scores for training set for each decision tree classifier
# Iterate through each decision tree classifier in 'clfs'
for clf in clfs:
pred_train = clf.predict(X_train)
f1_train = f1_score(y_train, pred_train)
train_f1_scores.append(f1_train)
test_f1_scores = []
for clf in clfs:
pred_test = clf.predict(X_test)
f1_test = f1_score(y_test, pred_test)
test_f1_scores.append(f1_test)
# Create a figure
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("Alpha")
ax.set_ylabel("F1 Score")
ax.set_title("F1 Score vs Alpha for training and test sets")
ax.plot(ccp_alphas, train_f1_scores, marker="o", label="training", drawstyle="steps-post")
ax.plot(ccp_alphas, test_f1_scores, marker="o", label="test", drawstyle="steps-post")
ax.legend(); # Add a legend to the plot
# creating the model where we get highest test F1 Score
index_best_model = np.argmax(test_f1_scores)
dtree3 = clfs[index_best_model]
print(dtree3)
plot_confusion_matrix(dtree3, X_train, y_train)
dtree3_train_perf = model_performance_classification(
dtree3, X_train, y_train
)
dtree3_train_perf
plot_confusion_matrix(dtree3, X_test, y_test)
dtree3_test_perf = model_performance_classification(
dtree3, X_test, y_test
)
dtree3_test_perf
The post-pruned Decision Tree (dtree3) is the final chosen model. It is the best-performing and most generalized model from the tuning process, achieving an Accuracy of 0.991, a Recall of 0.958, and a Precision of 0.948 on the test set, culminating in the highest F1-score of 0.953. This performance ensures that the marketing department can target customers with high confidence, minimizing the cost of reaching out to non-buyers while successfully capturing nearly 96% of all potential loan takers.
# list of feature names in X_train
feature_names = list(X_train.columns)
# set the figure size for the plot
plt.figure(figsize=(10, 7))
# plotting the decision tree
out = tree.plot_tree(
dtree3,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# add arrows to the decision tree splits if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black") # set arrow color to black
arrow.set_linewidth(1) # set arrow linewidth to 1
# displaying the plot
plt.show()
print(
tree.export_text(
dtree3,
feature_names=feature_names,
show_weights=True
)
)
models_train_comp_df = pd.concat(
[
dtree1_train_perf.T,
dtree2_train_perf.T,
dtree3_train_perf.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree (sklearn default)",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_train_comp_df
The performance comparison on the training set clearly illustrates the process of regularization: the initial default Decision Tree with perfect metrics (1.0) was heavily overfit; pre-pruning intentionally reduced training performance (F1-score ) to build a more robust, generalized model, while post-pruning further simplified the tree, resulting in a slightly lower training F1-score (), but ultimately yielding the best-performing, most generalized model on the unseen test data.
models_test_comp_df = pd.concat(
[
dtree1_test_perf.T,
dtree2_test_perf.T,
dtree3_test_perf.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Decision Tree (sklearn default)",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Test set performance comparison:")
models_test_comp_df
The test set comparison confirms that pruning significantly improved model generalization: the post-pruned Decision Tree delivers the best overall performance, achieving the highest Accuracy () and F1-score (; critically, its high Precision () compared to the default model () makes it the most effective model for targeting potential loan customers while minimizing wasted marketing efforts.
# importance of features in the tree building
importances = dtree2.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
The feature importance analysis reveals that Income is by far the most significant factor driving personal loan purchase, followed closely by the customer's Education level, and then Family size. The other features, such as CCAvg (Credit Card Average Spending) and CD_Account (Certificate of Deposit Account), have considerably lower but still relevant influence on the prediction.
The Post-Pruned Decision Tree (dtree3) is the superior choice because it achieved the highest performance metrics on the unseen test data, which is the true measure of a model's predictive power and generalizability.
%%time
# choosing a data point
applicant_details = X_test.iloc[:1, :]
# making a prediction
approval_prediction = dtree3.predict(applicant_details)
print(approval_prediction)
Using the post-pruned Decision Tree model (dtree3) to predict the outcome for a single customer resulted in a classification of 0 (No Loan), and the entire inference process was completed quickly in milliseconds of wall time, confirming the model's efficiency for immediate, high-volume operational use.
approval_likelihood = dtree3.predict_proba(applicant_details)
print(approval_likelihood[0, 1])
Key Drivers of Loan Purchase (Targeting Criteria)
The most effective campaign should focus its budget on the following customer attributes, as they are the primary drivers of loan acceptance:
Income (Most Important): This is the paramount predictor. Customers with high annual income (in thousands of dollars) are significantly more likely to take a personal loan. The marketing message should appeal to their financial capacity and ability to handle debt.
Education (Second Most Important): This feature, often combined with Income in the tree's split rules, suggests that the most receptive customer base is highly educated and affluent.
Family Size: The analysis indicated that a median family size of 3 is more receptive. This segment likely has rising financial needs (e.g., mortgages, education costs) that a personal loan could address.
CD Account (Positive Correlation): Counter-intuitively, customers who hold a Certificate of Deposit (CD_Account) are more likely to accept the loan. This suggests that the bank's successful target demographic is financially sophisticated and affluent, using the bank for both savings (CD_Account) and credit (Personal_Loan).
Conclusion
The Decision Tree modeling process, culminating in the Post-Pruned Decision Tree (dtree3), provides AllLife Bank with a highly accurate, fast, and transparent tool for targeted marketing. This model achieved the best generalization performance, with a test set Accuracy of and a critical Precision of , ensuring the bank can maximize the return on its marketing investment.
The bank should adopt a high-precision, tiered marketing strategy using the Post-Pruned Decision Tree (), which showed superior test performance (Precision: ).
Focus: Target customers predicted with above .
Method: Use the bank's most personalized, high-cost outreach methods, such as direct calls from relationship managers. The high Precision () minimizes wasted budget.
Messaging: Frame the loan as an exclusive offer for valued, affluent clients to help them achieve a specific financial goal (e.g., investment, secondary property, education fund), appealing to their financial sophistication rather than basic need.
Action: Immediately exclude all customers whose profile leads to a final leaf node with .
Rationale: The model is confident these individuals will not buy. Marketing to this group is pure budget waste, and resources should be shifted to Tier 1.
The model’s inference time (under milliseconds per customer) is extremely fast, allowing the bank to score the entire liability customer base quickly and enable real-time loan offers during customer service interactions.
The most receptive customers are defined primarily by high income and high education, refined by other financial factors. The bank should prioritize customers who fit the following profiles:
Very High Income: Customers whose income falls above the tree's primary income split threshold are the most likely to convert, regardless of other factors. Marketing should focus on messages about wealth management and investment leverage.High Income and Advanced
Education: This segment includes affluent customers with an Education Level 3 (Advanced/Professional degree). This highly refined group should be targeted with personalized communication from relationship managers, emphasizing their exclusive status.
Moderate-High Income with a CD_Account: This profile indicates a financially sophisticated customer who uses the bank for both major savings and strategic credit, making them a reliable conversion prospect. The loan should be promoted as a tool for financial diversification.
High Income with High Credit Card Spending (CCAvg): These customers demonstrate an active and comfortable relationship with credit, suggesting a higher propensity to accept a new loan offer.
High-Confidence Predictions (P_Loan=1): Any customer whose features lead the Decision Tree model to predict a very high probability of acceptance should be placed in the highest priority tier to justify the cost of personalized, high-value outreach.